Learning with Taxonomies: Classifying Documents and Words

نویسندگان

  • Thomas Hofmann
  • Lijuan Cai
چکیده

Automatically extracting semantic information about word meaning and document topic from text typically involves an extensive number of classes. Such classes may represent predefined word senses, topics or document categories and are often organized in a taxonomy. The latter encodes important information, which should be exploited in learning classifiers from labeled training data. To that extent, this paper presents an extension of multiclass Support Vector Machine learning which can incorporate prior knowledge about class relationships. The latter can be encoded in the form of class attributes, similarities between classes or even a kernel function defined over the set of classes. The paper also discusses how to specify and optimize meaningful loss functions based on the relative position of classes in the taxonomy. We include experimental results for text categorization and for word sense classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Ontology Based Machine Learning for Semantic Multiclass Classification

Following the development of semantic web technologies, many ontologies and thesauri have been proposed to index resources during the last decade. However, despite their expressiveness, those knowledge models do not always cover all the points of interest within dedicated applications. Therefore, alternative ad hoc taxonomies have been developed to support resources classifying processes. This ...

متن کامل

Unsupervised Concept Hierarchy Induction: Learning the Semantics of Words

Unsupervised concept hierarchy induction, or taxonomy learning, is the task of hierarchically classifying word senses in order to develop a taxonomy of concepts. Taxonomies of concepts such as the one found in WordNet (Fellbaum, 1998) are important resources for a variety of Natural Language Processing (NLP) including word sense disambiguation (Ramakrishnan et al., 2004; Navigli & Velardi, 2004...

متن کامل

FOLKSONOMY - SUPPLEMENTING RICHE EXPERT BASED TAXONOMY BY TERMS FROM ONLINE DOCUMENTS (Pilot Study)

RICHE (Research Inventory of Child Health in Europe) is a platform developed and funded under the Health domain of 7th European Framework Program. The platform search engine is expected to use the multilingual taxonomy of terms for processing and classifying large volumes of documents of the RICHE repository. So far the experts participating in this project have produced the initial version of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003